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ABSTRACT 

As the speed and complexity of computer networks evolve, sharing network 
resources becomes increasingly important. Thus, the issue of how to allocate 
the available bandwidth among the multitude of users needs to be addressed. 
Such allocation needs to be in some sense efficient and fair to different users. In 
this work the so-called maxmin fairness is chosen as the optimality criterion. 
A new distributed and asynchronous algorithm is suggested. The algorithm 
is shown to converge to the optimal rate allocation in a network with general 
topology under dynamic changes in the set of network users, individual user 
load and occasional route changes. An upper bound on convergence time 
is given. The algorithm is shown to be well-behaved in transience. Unlike 
previous work, the algorithm takes bandwidth consumed by feedback traffic 
into account. Further, an extension of the algorithm is suggested to address 
the problem of policing misbehaved users. 
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1 Introduction 

This section discusses design decisions adopted in this work, describes the 
existing results for the chosen model, summarizes the main results of this 
work and finally gives the brief layout of the remaining sections. 

1.1 Background 

There has been extensive debate in the literature about the relative merits and 
drawbacks of open-loop control schemes versus closed-loop control schemes. 
The large propagation delay to packet transmission time ratio in the modern 
high-speed networks poses a significant challenge for any end-to-end feedback 
scheme [21], [24], [25]. As a result, a number of open-loop alternatives like 
prior reservation and switch-based controls have been suggested. 

Prior reservation schemes are generally considered to be suitable for 
steady stream-like traffic with a priori known resource requirements. Reser- 
vation also provides quality of service guarantees that are difficult to achieve 
with walk-in service. The price for this, however, is the lack of flexibility in the 
presence of dynamic changes in the network load leading to potential waste of 
precious network resources. 

Switch-based controls have been shown to be necessary for achieving 
fairness [27] . However, if no source-based control is exercised, the sources may 
continue to inject excessive traffic into the network, causing more overload and 
wasting network resources. 

The approach adopted in this work is based on the cooperation between 
the sources and the network in sustaining an acceptable network load from the 
standpoint of fairness and efficiency. This approach is similar to that of [23] 
and [26]. 

The problem of load allocation is twofold - the sources must be able 



to determine their optimal load, and the network must ensure that even if 
all sources operate at their optimal rates, these rates are enforced across the 
network. The mechanism of such enforcement strongly depends on the shape 
of source traffic, a particular flow control mechanism, and service discipline 
of all switches in the network. There is a vast amount of work addressing 
the issue of preserving feasible user rates under various assumptions on the 
underlying service discipline and the shape of source traffic, (see for example 
[3], [11], [12], [14], [25]). 

This thesis addresses the first problem, i.e. how to determine the set of 
optimal rates in a distributed network under dynamic changes in the absence 
of centralized knowledge about the network and without synchronization of 
different network components. 

We consider a system in which switches maintain their own controls. 
Switches communicate these controls to the source by feedback. We consider 
an end-to-end feedback scheme, in which the destination generates feedback 
packets which deliver the aggregate feedback signal from all switches on the 
packet's route back to the source. Upon receipt of the feedback signal, the 
source adjusts its load accordingly. The details of the algorithm are discussed 
later in this work. 

We show that the algorithm is very general in nature and is applicable 
to a broad range of service disciplines and underlying traffic shapes. This 
flexibility is largely due to our choice to decouple the problems of determining 
the optimal rates and enforcing them. 

1.2 Route Selection 

It is assumed that at any time of the algorithm operation the route of each 
session is unique. We allow the route to change from time to time (for example 
in response to equipment failures or due to some routing decisions), but we 



disallow existence of more than one route at any given time. We assume that 
the changes in the route do not occur very often, and that in the absence of 
network failures the routes will eventually stabilize for a given set of network 
users. 

The obvious argument against this approach is that the best load al- 
location for a particular choice of session routes may not be the best over all 
possible route choices. However, even if session routes are unique, the prob- 
lem of optimal rate allocation is non-trivial and deserves proper attention. In 
addition, note that the algorithm is shown to be robust in the presence of dy- 
namic route changes. Thus, it can be run in conjunction with any independent 
routing algorithm which will eventually stabilize to some route. As soon as 
the route is found, our algorithm will recover from any past changes and will 
converge for the optimal rates for this route. 

1.3 Network Model 

In the real world endnodes are interconnected through a complex network of 
switches. There can be many users physically located at one network node, 
each of them conducting perhaps several communication sessions with other 
users in the network. Some of those sessions can be bi-directional like a con- 
versation, other can be uni-directional, like file transfer. 

For our purposes, we assume that all sessions are independent. More- 
over, we simply treat one user conducting several sessions as several inde- 
pendent users. Similarly, we treat any bi-directional data exchange as two 
independent uni-directional ones. 

We assume that any two connected nodes in the network are connected 
by a pair of half-duplex links of identical capacity pointing in the opposite 
directions. In general different link pairs have different capacities. 

It is assumed that each enduser is connected to exactly one switch. 



Endusers are not connected directly to each other. A switch can be connected 
to zero or more endusers of any type and to zero or more other switches. It 
is assumed that there is a path going through one or more switches from any 
source to its respective destination. 

It is convenient to make a distinction between the 'entry' links into 
the network and all other links. The 'entry' links are in essence artificially 
created in the model to separate different users located at one network node. 
While capacities of all other links are real physical restrictions, capacities of 
the 'entry' links can be chosen as we please as long as they do not impose 
additional restrictions on session flows. It will be seen later that it is convenient 
to consider these capacities to be equal to the session demand. Thus we allow 
these capacities to be infinite if session demand is infinite. Capacities of all 
other links are assumed finite. 

Similarly, the model creates an artificial switch per endnode located 
at the entry into the network. This switch has two 'real' half-duplex links 
connecting it to the network and 2m artificial half-duplex links, connecting it 
to m endusers located at the real-life endnode. 

1.4 Optimality Criterion 

The goal of this work is to determine a fair and efficient rate allocation . The 
precise meaning of the terms 'efficient' and 'fair' has been a target of extensive 
debate in the last two decades. References [1], [21], [24], [8], [7] contain a 
variety of approaches and definitions of fairness and efficiency. 

The approach adopted in this work chooses the so-called maxmin or 
bottleneck optimality criterion discussed in various modifications in [1], [12], 
[17], [23], [26]. 

This approach is based on the following intuition. 

Consider a network with given link capacities, the set of sessions and 
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fixed session routes. We are interested in such rate allocations that are feasible 
in the sense that the total throughput of all sessions crossing any link does 
not exceed the link's capacity. We would like the feasible rate allocation to be 
fair to all sessions. On the other hand we want the network to be utilized as 
much as possible. 

We now define a fair allocation in the following way. We consider all 
"bottleneck" links, i.e the link with the smallest capacity available per session. 
We give a strict definition of it in section 2. We share the capacity of these 
links equally between all sessions crossing them. Then we remove these sessions 
from the network and reduce all link capacities by the bandwidth consumed 
by the removed sessions. We now identify the "next level" bottleneck links 
of the reduced network and repeat the procedure. We thus continue until all 
sessions are assigned their rates. 

Such rate vector is known as maxmin fair allocation. The above global 
synchronized procedure for achieving maxmin optimal rates is well known and 
is described for instance in [1], [23]. 

It can be easily seen that the rate allocation obtained in such a way 
is fair in the sense that all sessions constrained by a particular bottleneck get 
an equal share of this bottleneck capacity. It is also efficient in the sense that 
given the fair allocation, no more data can be pushed through the network, 
since each session crosses at least one fully saturated link. 

Assuming that packets are infinitely small and the flows are determin- 
istic, it can be seen that maxmin fairness implies maximum efficiency in the 
sense that the bottleneck resource is utilized up to its capacity and no queues 
build up. 

It is well known, however, that for packets of finite size and the general 
distribution of packet arrival and service times utilizing the link to its full 
capacity leads to infinite queue growth and causes severe performance degra- 
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dation. Thus, for general distribution of arrival and service times utilizing the 
bottleneck to its full capacity is not good for efficiency. Thus, in this case a 
different efficiency criterion is called for. 

Reference [26] introduces the optimal efficiency criterion for a general 
network configuration as the maximum power of the bottleneck resource, where 

Resource Power = Bottleneck Resource Throughput 
Bottleneck Resource Response 1 ime 

The resource capacity at which the power is maximized is called the knee 
capacity. In general, the knee capacity depends on the particular distribution 
of the packet arrival times and service discipline. 

However, if the knee capacity is known, then applying the global pro- 
cedure for determining maxmin fair rates described above to the network with 
the knee capacities replacing the original capacities, we can obtain the rate 
allocation which is fair in the sense that the bottleneck resources are still 
shared equally among their users and efficient in the sense that the bottleneck 
resource power is maximized. 

In summary, provided the knee capacities are known, we can use maxmin 
optimality on the network with knee capacities for both efficiency and fairness. 

In practice, the knee capacities are not known a priori. As a result, 
either an a priori estimate is required, or an independent algorithm for con- 
gestion detection must operate in parallel to provide this estimate "on the 
fly" [26] . Another approach might be to combine the two by choosing some 
conservative estimate of the knee capacity and then attempt to adjust it if the 
link detects that it is constantly underutilized. 

For the purposes of this work we assume that the knee capacities are 
known. Moreover, we will use the word "capacity" to mean the "knee capacity" 
unless otherwise indicated. 
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1.5 Service Discipline 

The only assumption we make about the service discipline employed by the 
switch is that the packets of each session are served in FIFO order. Thus, the 
switches could be strict FIFO, FIFO+, Priority, Stop-and-Go, Fair-Queuing, 
etc. We emphasize that the reason for such flexibility is that the algorithm 
presented in this work is a calculation algorithm and is not concerned with 
enforcement of the rates. Such enforcement is strongly dependent on the 
service discipline. 

In addition, we allow the switches to drop packets as they please, as 
long as at least some packets of each session continue to get through. While 
dropping packets can cause a lot of wasteful retransmissions, it is essential to 
note that our algorithm will still calculate correct optimal rates even in the 
presence of heavy packet loss. This property seems very important, since it 
means that the algorithm is robust in the presence of data loss due to heavy 
temporary congestion. 

1.6 Previous Work and Summary of Results 

The procedure for achieving maxmin optimal rates described earlier used 
global information, which is expensive and difficult to maintain in the real- 
world networks. 

Several feedback schemes have been proposed to achieve the same goal 
in a distributed network. In essence, all these schemes maintain some link 
controls at the switch level and convey some information about these controls 
to the source by means of feedback. Upon receipt of the feedback signal the 
source adjusts its estimate of the allowed transmission rate according to some 
rule. 

These algorithms essentially differ in the particular choices of link con- 
trols and the type of feedback provided to the source by the network. 
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References [6], [15], [17] describe distributed algorithms of this type. 
However, these algorithms required synchronization, which is difficult to achieve. 

Mosley in [23] suggested an asynchronous algorithm for distributed 
calculation of maxmin fair rates. The algorithm was shown to converge to 
maxmin optimal rates. However, the algorithm convergence time was rather 
slow and simulations showed poor adaptation to dynamic changes in the net- 
work. 

Later Ramakrishnan, Jain and Chiu in [26] suggested a distributed 
asynchronous algorithm for achieving maxmin optimal rates which uses a dif- 
ferent type of feedback. The switches still calculate fair rate allocation for all 
sessions crossing its outgoing links, but this allocation is not explicitly com- 
municated to the source. Instead, a bit is set in the packet's header if its 
current flow across the link exceeds the current value of the link's fair alloca- 
tion. When the source receives packets with the bit set, it decreases its rate, 
otherwise it increases it. The algorithm has an attractive property of using 
just one bit in the packet header for feedback. It has been extensively tested 
in a variety of real-life network configurations and have been demonstrated 
to be fair and efficient even under dynamic network changes. However, while 
the simulation results are extremely favorable, no theoretical guarantees on 
the algorithm convergence to an optimal operating point in a general network 
topology are available. Moreover, since the optimal rates are not provided to 
the sources, the algorithm produces oscillations around the optimal rate and 
it may take a long time to get close to the optimal solution. 

The approach adopted in this work requires explicit calculation of the 
optimal rates. It defines a family of link control calculation policies and a 
feedback mechanism which ensure convergence to maxmin optimal rates from 
any initial conditions. An algorithm employing any of these policies is shown 
to be self-stabilizing in the sense that it recovers from any past errors, changes 
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in the set of network users, individual session demands, and session routes. 

It is demonstrated that convergence of the algorithm is generally faster 
than that of the algorithms describe earlier in this section. An upper bound 
on convergence time is provided. 

In addition, it is shown that the algorithm is 'well-behaved' in tran- 
sience. In particular, it is shown that given an upper bound on round-trip 
delay, the actual transmission rates can be kept feasible throughout the tran- 
sient stages of algorithm operation while still providing reasonable throughput 
to all users. 

These qualities are extremely important in a dynamic network where 
changes in user load caused by newly arrived sessions can cause infeasibility 
which must be quickly taken care of to avoid large queue buildup and perfor- 
mance degradation. 

We also suggest a mechanism for policing misbehaved users. 

In addition, unlike previous work, we take into account the bandwidth 
consumed by feedback traffic. 

Simulation results demonstrate that the algorithm works well under 
dynamic changes in the network load. 

1.7 Outline 

Section 2 contains the formal definition of the optimality criterion and pro- 
vides a global procedure for determining optimal rates in the presense of real 
feedback traffic. 

Section 3 contains the description of the distributed algorithm. 

Section 4 gives the convergence theorem. 

Section 5 discusses the transient behavior of the algorithm and provides 
an upper bound on convergence time. 

Section 6 gives the results of several simulation experiments. 
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Section 7 contains a discussion on some of related issues and suggests 
an extension of the algorithm to policing misbehaved users. 

Section 8 summarizes the results and gives some suggestions for future 
research. 

2 Optimality Criterion 

2.1 Definition of the MAXMIN Optimum 

It seems natural to consider only static rate allocations for possible candidates 
for an optimal allocation. Once such optimal allocation is defined, our goal 
can be formulated as finding an algorithm to dynamically control an arbitrary 
rate allocation to bring it as close to the static optimum as possible. 

We start with defining a feasible set of rate allocations r\ as follows: 

Vi > (1) 

E Ki + kw i,j)Vi < Cj (2) 

tee,- 

where r\i is transmission rate of session i, Uij = 1 if session i crosses j 
on its forward route and otherwise, and Wij = 1 if session i crosses j on its 
feedback route and otherwise, and Qj is the set of all sessions crossing link 

3- 

(1) simply states that we are not interested in negative transmission 
rates, while (2) ensures that a rate allocation is such that no link capacity is 
exceeded. 

Now we can define the optimality criterion on this feasible set as follows. 
We need the following definition first. 



16 



Definition 2.1 Consider vector a = (a l5 . . . , a n ). Let a = (a l5 . . . , a n ) be a 
permutation of a such that a^ < a,j if i < j. Vector b is said to be lexicografi- 
cally greater that a if either a\ < b\ or 3 1 < j < n s.t. a^ = bi VI < i < j 
and <i{ < 6, 

Now we define the maxmin optimal vector of transmission rates by 

Definition 2.2 Vector ry = (f)i,- ■ ■ ,f)s) is called maxmin optimal for network 

M if 

• it satisfies restrictions (1) and (2) 

• it is lexicografically greater than any other feasible solution of (1) and 

(2) 

It can easily be seen that this definition in fact means that the opti- 
mal vector is such that its smallest component is maximized over all feasible 
vectors, then, given the value of the smallest component, the next smallest 
component is maximized, etc. 

The next section describes a global procedure to obtain maxmin optimal 
rates for a network with feedback traffic. 

2.2 Finding Globally Optimal Rates 

In this section we give a way to find the stationary optimal vector t] given 
global information about the network. The results here are quite similar to 
those given in [1], [23], [15], [26]. However, this work considers a somewhat 
different model than the cited authors, since our model accounts for the band- 
width consumed by feedback flows. Note that it is not clear a priori whether 
it is legitimate to treat feedback sessions in the same way as independent 
forward sessions, since their rates cannot be chosen independently from their 
corresponding forward sessions. 
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For the sake of simplicity we consider the case of "greedy" sessions, i.e. 
sessions with infinitely large demands. Note however, that the case of finite 
demands can be reduced to the "greedy" case by simply adding artificial links 
of capacity equal to the session demand at the entry of each session to the 
network. 

We start with the following definition. 

Definition 2.3 Link I is called bottleneck with respect to network 
^(A<S)*/^ = min je£ ^ 

Note that this is slightly different from the traditional definition of a 
bottleneck link. In our definition, we allocate a link's capacity between sessions 
(forward and feedback) sharing this link in such a way that each session is 
allocated j%^ on its forward way. 

Optimal stationary rates can now be found by the following procedure. 

We find all bottlenecks link of the network and set the transmission 
rates of all the sessions crossing these links in either direction to . c \, and 
mark those sessions. Then we decrease capacities of all links by the total 
capacity consumed by the marked sessions crossing these links on their forward 
or feedback paths. We consider a reduced network with all link capacities 
adjusted as above and with marked sessions removed. We repeat the procedure 
until all sessions are marked. 

This procedure can be formalized as follows. 

PROCEDURE GLOBAL OPTIMUM 
Given network J\f(C, S) 



START: 
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Denote: 

L\ - set of all links I e C s.t. at least one session of S crosses I 

on its forward of feedback path 



C\ - set of all links I e L\ s.t. . , ' = min. c/ i . , ■' 



n = jpja. for any j e & 

Si - set of sessions crossing at least one link L\ 

fl - number of sessions of S\ crossing link I on forward path 

b\ - number of sessions of S\ crossing link I on feedback path 

ITERATION i : 

Given: 

oi, . . . , Si-l, 

£i, . . . , £i-i 

°1 > ■ ■ ■ > °l 
fl fi—l 

Jl 1 • • • 5 Jl 

n,... ,tj_i 

Define : 

<Sj_i = «Si U . . . U «Sj_i, 

A -i = m U . . . U Ci-i, 

Li set of all links I E C\ £j-i s -t. at least one session of <S \ <Sj_i crosses 

this link on its forward or feedback path 
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L{ - set of all links I e £i s.t. 



■' ^^^ TTl 1 "11 n 



5j - set of sessions of 5 \ <Sj_i crossing at least one link on Ci 

Vied 



Ci-TtWfi+kK) 



ii+*6z-E"=i(/f+* 6 f) 

»5j = o j U <Sj-i 

d = A u A i 

// - number of sessions of <S« crossing link / on forward path. 
b\ - number of sessions of Si crossing link I on feedback path. 

If S = Si , then STOP 
Else perform iteration i + 1 

END of GLOBAL OPTIMUM 

Theorem 2.1 _Z. Procedure Global Optimum terminates in a finite number 
of iterations. 

2. When the procedure terminates, all sessions are assigned their globally 
optimal rates. 

3. Let T{ be the optimal rates assigned at iteration i. Then r x < . . . < r m 

4- Let Ci, Si and T{ be the set of bottleneck links of the reduced network 
of iteration i and sessions crossing these links respectively. Then any 
session in Si crosses at least one link in Ci 

5. Only sessions from Si U ... U Si go through any link in Ci V 1 < i < m 

20 



6. V 1 < i < m 

f _ Cj-Y^Ul+W . fl r. 

< / i +^-E;: 1 1 (//+^) ; * 



7i < 



where L{ is the set of sessions in £ \ L\ U . . . U Li s.t. least one session 
of S \ Si U . . . U Si crosses I. 

The proof of this theorem is given in Appendix 2. 

3 Distributed Algorithm 

3.1 Assumptions and Goals 

Section 2.2 provided a way to determine the optimal rates of a fixed fixed 
set of sessions using the global knowledge of the network. In addition, the 
global algorithm described there required synchronization of stages in which 
the optimal rates were assigned. 

This section presents an algorithm to achieve the same goal in a dis- 
tributed asynchronous way. 

We start with a few words about the assumptions of the model and the 
goals of the distributed algorithm. 

We now allow sessions to exit or enter as they please. However, to make 
the notion of optimal rates meaningful, we must assume that the sessions enter 
or exit not too often, in the sense that there is an extended period of time in 
which the set of sessions in the network is fixed. Then for this period we can 
define optimal rates as in section 2. 

Thus, we allow the sessions to go through a period of instability in 
which some sessions can enter and exit, and then to stabilize to some fixed 
set for an extended period of time. We want the algorithm to stabilize to 
the optimal transmission rates for this set. Once the network has reached its 
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current optimal state, we want it to remain there until new sessions enter or 
old sessions exit. If some sessions exit or enter, the optimal rates over the new 
set change as well. We want the algorithm to stabilize to the new optimal 
rates. If the set of network users changes much slower than the time required 
for the algorithm to converge, then the network will spend most of the time 
in a currently optimal state. 

3.2 High-Level Description 

The essential idea of the algorithm is to emulate the iterations of Procedure 
Global Optimum in a distributed asynchronous way. 

To achieve this we let all packets carry an estimate of the bandwidth 
available for the session. This estimate will be referred to as the packet's 
'stamped rate'. We stress here that in fact the algorithm does not require 
that every packet carries the stamped rate. We use this assumption only for 
simplicity. It will become clear that special control packets can be used to 
carry the stamped rate, or only a fraction of data packets can be used for this 
purpose. 

We let each link maintain its current estimate of the fair share of its 
own capacity, referred to as the link's 'advertized rate'. 

Originally the source sets the packet's stamped rate to some arbitrary 
initial value. As the packet travels through the network, its stamped rate is 
reset to the smallest of the packet's initial rate and the smallest of advertized 
rates of all links on the packet's round-trip route. 

When a feedback packet returns to the source, the source adjusts its 
transmission rate according to the stamped rate of the feedback packet. 

Each link maintains a list of its users. It adds a session to this list when 
the first packet of a new session is received. It deletes a session from the list 
when it determines that a session has exited. We do not address the issue of 
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exactly how this determination is done. One could define a timeout value, or 
let the sessions send a special "last" packet. For the purposes of this work, 
however, we ignore any details or issues associated with any such choice, and 
simply assume that there is some way a switch can recognize the fact that the 
session is no longer active. 

For each user the link stores its last seen stamped rate. It will be 
referred to as the 'recorded rate' of the session at the link. 

The link sets a bit in the session's entry if a packet of that session is 
received with stamped rate below or equal to the current advertized rate of 
the link. We say that a session with this bit set at some link is marked at the 
link. 

The link then calculates its advertized rate as 

C ~? - (3) 

f + kb^f^kb V ' 

where C is capacity used by last seen stamped rates of the sessions marked at 
this link; /, b, f, b are the number of total and marked forward and feedback 
sessions at the link respectively. 

It is essential that the set of marked sessions at any time must satisfy 
the following conditions: 

1. If any session is marked, its recorded rate is less than or equal to the 
advertized rate of the link. 

2. Advertized rate is calculated according to (3). 

The above conditions will be referred to as M-consistency, for "mark- 
ing" consistency. 

If at any time a session violates M-consistency, it must be immediately 
unmarked and the advertized rate must be recalculated. It turns out that M- 
consistency is central to ensure convergence of the algorithm to optimal values 
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from any initial conditions. We will discuss M-consistency in more detail later 
in this work. 

Finally note that is possible that if the source's idea of a session's rate is 
below the advertized rates of all sessions in the sessions route, and the session's 
demand is not satisfied, "obeying" the stamped rate received in the feedback 
packet would cause the session to operate below its optimal rate. To avoid this 
condition, an extra bit in the packet header is used. The bit will be referred to 
as the "u-bit". A "greedy" session, (i.e. a session whose demand is infinite or 
unknown), set's the u-bit to on all of its packets. A "conservative' session, 
whose demand is known and finite, sets the u-bit of all its outgoing packets 
to l.If the packet's stamped rate is above or equal to the advertized rate of 
a link in the packet's route, the link sets the u-bit to 1. Hence, if a feedback 
packet returns with u-bit set to 0, it means that advertized rate of all links 
was higher than the stamped rate of the packet. In this case the source ignores 
the received stamped rate and resets its idea of allowed rate to its demand. 

There is no synchronization between operation of different network 
components The next section contains a formal description of the algorithm. 

3.3 Algorithm Description 

This section describes the data structures and operation of network compo- 
nents. Where appropriate, 'pidgin C code is used to describe component op- 
eration. The code is not intended to be efficient and is sometimes redundant 
for the sake of clarity of the underlying ideas. 

3.3.1 Data Structures 

Packet p : 

u p 'u-bit' used to indicate that the session's rate can be increased 
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Pp packet's stamped rate 

t p packet's type (forward or feedback) 

Source s : 

p s stamped rate of the last feedback packet received 

u s bit indicating whether or not to set the u-bit of outgoing packets 

d s demand of the session 

Destination d : 

Pd stamped rate of the last forward packet received 

Ud 'u-bit' of the last forward packet received 

countd used for counting the number of unacknowledged forward pack- 
ets 



Link I 



C\ Capacity of the link 

fi Number of forward sessions known at the link 

bi Number of feedback sessions known at the link 

Qi Set of sessions known at the link 

For any session i E Q\ : 

a\ - bit used to mark the session at the link 

8\ - is equal to 1 if the session is forward and to k if it is feedback 
(Note that k is a universal constant across the network) 

£l - recorded rate of the session 

Hi - advertised rate of the link, calculated as 
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C x if /, + kh = 

IH = < Ci ^EjeG, tjtjdj + max !ee, fl if fi + kb i = T,jeg, fy\ 

fl+kh-Y . „ d'.a'. 



^& "„', otherwise 



3.3.2 Source Operation 

sourceJnitialize (source s) { 

/* called at initialization time */ 

if (demand not known) set d s = oo, u s = 
Ps = d s 

if (d s < oo) 

u s = 1; 
else 

u s = 0; 
} 

source_receive_packet (source s, packet p) { 

/* called upon receipt of a feedback packet */ 

*/ (Pp > d s ) { /* must be that demand has decreased */ 

p s = D s ; 

u s = 1; 

} 

else if (u p == 0) { 
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(4) 



Ps = Ds; 

if(D s < oo) u s = 1; 
else u s = 0; 



} 

e/se { 



/* Packet passed at least one link whose advertized rate 
/* was equal to the packet's current stamped rate, 
/* so obey this rate 

Ps = Pp] 

u s = 0; 



source_generate_packet (source s, packet p){ 
create new packet p 

Pp = Ps 

Up u s 

add p to the outgoing link's output queue 

} 

3.3.3 Destination Operation 

destination initialize (destination d) { 

countd = 

Pd = 0; u d = 0; 
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destination_receive_packet (destination d, packet p){ 

/* called upon receipt of a forward packet */ 

county = county + 1 

/* setparameters for the feedback packet as seen in the 

/* last of the k packets to be acknowledged 

if (county == k) { 

Pd = Py\ Ud = Up', 

countd = 0; 

create new feedback packet p 

P P = Pd] u p = u d 

send p 

} 
} 



3.3.4 Switch Operation 

As soon as a packet arrives to an input link of a switch, it is added to the end 
of the output queue of the appropriate outgoing link. If several packets are 
received simultaneously from different input links, they are processed in some 
random order. 

3.3.5 Output Link Operation 



link jnitialize (link 1) { 

/i = 0; 6/ = 0; Q x 
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Hi = Q 
} 

link_action(link 1, packet p){ 

/* called when packet p is at the head of the link's output queue */ 
if any session exited 

call link_update_session_exit (link 1, session s) 
*/ (h & Qi) /* packet belongs to a new session */ 

call link_update_new_session(link 1, packet p) 
else /* packet belongs to a session already seen at the link */ 

call link_update_known_session(link 1, packet p) 
transmit packet p 

} 

link_update_session_exit(link 1) { 

/* update the list of known sessions */ 
if packet of forward session 

fi = fi &1 
else /* feedback session */ 

h = h Ol 

Qi = Qi\ {i} 

/* recalculate advertized rate [i\ with updated information */ 

Hi = calculate _adv _rate(l); 

} 

link_update_new_session(link 1, packet p) { 
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/* update the list of known sessions */ 

Qi = Qi u {i p } 
8\ = t p + k(l &t p ) 
fi = fi + t p 
k = k + (1 &t p ) 

/* do not mark the new session */ 

< = 0; 

/* note that we do not need to set the recorded rate of the 

/* new session at this time since unmarked session's recorded rate 

/* is not used in calculation of advertized rate 



M! 


= calculate_adv _rate{l) 




*/ (Pp > Vl) { 










Pp = 


V-W 








Up = 


i; 






} 










/* 


record the 

= Pp\ 


new 


rate now 


*/ 



link_update_known_session(link 1, packet p) { 

*/ (pp > Mi) I 

Pp = P'h 
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u p = 1; 

} 

*/ (P P < Hi) 

< = i; 

Hi = calculate _adv _rate(l); 

calculate_adv_rate(link 1) 

{ 

/* first calculate advertized rate with given set of marked sessions */ 

RATE_CALCULATION: { 
if(fi + kb l )==0 

Hi = Ci 
tf(fi + kb 1 ==E j ^ l S l j a l j ) 

Hi = Ci ^T,jeg, ttfjdj + niax iGa! g 
else 

_ ' ^j6g; ^3 3 3 

} 

unmark any session whose recorded rate is above the calculated advertized rate 

repeat RATE_CALCULATION once more and return 
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4 Convergence Theorem 



In section 3.2 we introduced the notion of M-consistent calculation of the 
advertized rate of the link. Essentially, M-consistency means that once the 
advertized rate is calculated with some set of marked sessions, no session 
remains marked with recorded rate exceeding the advertized rate. Function 
calculate _adv _rate() in the algorithm description provides a possible way to 
perform M-consistent calculation. Note that the result of the M-consistent 
calculation is not only the advertized rate but also the set of "marked" sessions. 
Lemma 4.1 below proves that the result of this function is in fact M-consistent. 
Convergence theorem given later in this section proves that given any M- 
consistent advertized rate calculation the algorithm described in the previous 
section will converge to the optimal rate vector if started from arbitrary initial 
conditions. Thus, function calculate _adv_rate() can be treated as a "black 
box" and can be replaced by any other function providing M-consistent result. 
Thus in essence, Convergence Theorem 4.1 proves convergence of a family of 
algorithms with M-consistent link control calculation. We will return to the 
issue of M-consistency in section 7, where will will also give another example 
of M-consistent calculation. 

Lemma 4.1 After any link state update the advertized rate of the link and the 
marking of the sessions known at that link are M-consistent. 

Proof of Lemma 4.1. Consider any link update. Let y be the set 
of sessions marked at the beginning of this update. Let fj, 1 be the result of 
the first advertized rate calculation in function calculate_adv_rate(). Let Z 
denote the set of sessions which happen to be marked with stamped rates 
greater that fi lm By operation of function calculate _adv _rate() all sessions in 
Z will be unmarked. Then, if not all sessions are marked, the final advertized 
rate /j, returned by function calculate _adv_rate() is calculated as 
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_ C <^>E»gy\£ &<$» _ C <^>E»gy jjSj + E» e z j&i > 
/ + kb ^Y, i& y\z Si f + kb <3>Y, i& y Si + Y. i& z Si ~ 
> C 4»E»gy CiSi + ft 1 Z ie z Si = i 
~ f + kb ^E iey <*i + E* e2 * M 

The last equality can be easily checked. Since all sessions which remain 
marked after the second advertized rate calculation in calculate _adv _rate() 
have recorded rates below or equal to [/}, the statement of the lemma follows. 
If all sessions are marked, the statement of the lemma trivially holds, since 
by (4) advertized rate is greater or equal to the maximum recorded rate of its 
sessions. 

Theorem 4.1 Given arbitrary initial conditions on the states of all links in 
the network, states of all sources, destinations and arbitrary number of packets 
in transit with arbitrary control information written on them, the algorithm 
given in section 3 converges to the optimal rates as long as the set of sessions, 
their demands and routes eventually stabilize. 

Note that essentially any change in route or demand of the session is 
equivalent to an old session exiting and a new session entering. Thus, without 
loss of generality, the proof will be given under the assumption that demands 
and routes are fixed, but sessions are allowed to enter or exit, as long as 
eventually the set of sessions stabilizes. We give the proof for the case of 
infinite user demand. This does not cause any loss of generality, since as it has 
been already mentioned, the case of finite demands is reduced to this case by 
adding artificial links with capacities equal to session demands at the entry to 
the network. 

The proof of this theorem is based on the following 4 lemmae. 
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Lemma 4.2 After the set of sessions stabilizes at some time t and all these 
sessions have become known at all links in the network, 

C 

"V - mTkW) 

for all links I for all times t > t . Here fi and b\ are the number of forward 
and feedback sessions crossing link I respectively. 

Proof of Lemma 4.2 

In what follows the link index I and the time argument t is omitted. 

Consider the time of any state update of link I after to. By Lemma 
4.1 the result of any link update is M-consistent, so any marked session i has 
recorded rate & < [/,. Let y denote the set of indices j s.t. a,j = 1, (i.e. the 
set of marked sessions). Then, for the case when not all sessions are marked, 
by M-consistency 



M 



> 

/ + kb oE iG y Sj f + kb oE ie y Sj 
c 



Hence, » > j^. 

If all sessions are marked, two cases are possible. If maxi G g^i > -jt^, 
then by M-consistency fj, > maxi G g^i > tS^, where Q is the set of all sessions 
crossing the link, and the statement of the lemma holds. 

If all £j < -Skb, then by (4) and by M-consistency fj, = max£j + C <=> 
Y,ieg£i$i > C ^^Ejea h = C <^fJ>(f + kb) and the statement of the lemma 
follows. 

Lemma 4.3 Let t\ denote the optimal rate of sessions in Si, where Si is the 
set of sessions whose optimal rates were assigned at iteration i of Procedure 
Global Optimum, and Li - the set of bottleneck links of this iteration. Let t , 
fi, bi be as in Lemma 4-1- Then for any t > to it must be that 

m(t)>T 1 VZ€£\A 
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Proof of Lemma 4.3 

c, 



By Lemma 4.2 ^ > m+ [ m > ^ 

By Theorem 2.1 n = j^ if I G A and 

r 1 <^-if/ e £\£, 

This should be obvious since r x is the capacity per session of a first-level 
bottleneck link, which by definition must be smaller than j%^ of any other 
link. 

The statement of this lemma immediately follows. 

The next Lemma states that there exists some time, after which all 
sessions in S\ will have reached their optimal rate T\ and will be marked with 
this optimal rate at all links on their routes. 

Lemma 4.4 LetTi, Si, Ci be as in Lemma 4-3. Then 3 Ti > s.t. V t > T x 

1. p p . > T\ for any packet p of session i G S \ S\. 

2. £j > T\ for any session i € S \ Si and link I in the route of i or its 
feedback. 

3. fii = T\ for any session I € L\ 

4- p s = T\ for the source s of any session j € Si 

5. p s . > T\ for the source of any session j e S \ S\ 

6. p Pi = Ti for any packet p of session i e S\ 

7. a\ = 1, £f = T\ for all sessions i € S\ and all links I in the route of i 
or its feedback. 

Argument t is omitted here. 

The proof of this Lemma is given in the Appendix 3. 



35 



The result of this lemma will now be used as the base case for induction 
on the index i of <S«. Note that this lemma states that not only the sessions in 
Si have reached their optimal rates, but this rates will never change and the 
sessions will be marked at all links in their routes ever after (as long as the set 
of sessions remains the same). 

The inductive step is given the by following Lemma: 

Lemma 4.5 (Inductive Step). Suppose for some 1 < i < m 
3U>0 s.t. V t>U 

1. Hi = Tj for any link I € Cj, 1 < j < i 

2. p s = Tj for the source s of any session j £ Sj,l < j < i 

3. p s > Tj for the source s of any session j £ S \ (Si U . . . U Si) 
4- p Pk = Tj for any packet p of session k € Sj, 1 < j < i 

5. a\ = 1, £,{ = Tj for all sessions k € Sj 1 < j < i and all links I in the 
route of k or its feedback 

6. Pp. > Ti for any packet p of session j € S \ (Si U . . . U Si). 

7. £j > Ti for any session j € S\ (Si U . . . U 5$) and link I in the route of i 
or its feedback. 

Then 3 U + i > s.t. V t > ti + i such that conditions 1-7 hold for i + 1. 
It is assumed that the set of sessions has stabilized by time U. 

Proof of Lemma 4.5. 

By inductive hypothesis all sessions in Sj, 1 < j < i have reached 
their optimal rates Tj and these rates do not change as long as the set of 
sessions remains unchanged. Moreover, by inductive hypothesis any session in 
»5j, 1 < j < i is marked with its optimal rate Tj at any link on its way for all 
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times t > t m . Therefore, capacity of any link I in the network available for all 
sessions in S \ S, where $ = (5i U . . . U Si), is C x = Q «»E iG s. crossing i 5 j r j 
V t > t m . 

Consider a reduced network with links £\£j ,where C\ = (£iU. . .U£j), 
sessions S\S, and link capacities C\ = C\ <^>Ej G s. crossingi ^j T j- Note that it 
is legitimate to consider the reduced network precisely because by inductive 
hypothesis all sessions in «SiU, . . . , U<Sj have stabilized at their optimal rates 
and are forever marked at all links with their optimal recorded rates. 

Denote /, = /, ^E$ = i f) and 6 ; = k 4»EJ-=i &$ 

Note that by Theorem 2.1 tj+i = - Cl - for all / e £j+i and 

7i+i < ^ for all/ € £ \ A+i 

Now the argument of Lemma 4.3 can be repeated for the reduced net- 
work to show that [i\ > r^+i V / G £j + i and /// > r^+i V / G £ \ £j + i 

Then repeating the proof of Lemma 4.4 for the reduced network we 
show that all the statements of Lemma 4.5 hold. 

The statement of Theorem 4.1 follows by induction. 

It is important to emphasize the role of M-consistent link control cal- 
culation in the proof. The fact that after any link state update there were no 
marked sessions with recorded rates above the newly calculated link control 
value ensured that the link control was eventually increased to allow sessions 
with higher rates regain capacity unused by sessions with lower rates. 

5 Transient Behavior 

There are several important implications of the proof of Theorem 2.1. 

Note that the algorithm ensures that the optimal rates of sessions ia 
assigned in stages analogous to the stages of Procedure Global Optimum de- 
scribed in section 2.2. Lemmae 4.4 and 4.5 provide the framework for obtaining 
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a worst case bound on the number of round-trips required for convergence. In 
particular, these lemmae imply that provided no new changes occur in the 
network, 

• eventually all sessions of <Si get their optimal rates 

• once all sessions of <SiU, . . . , U<Sj get their optimal rates set, these rates 
are not going to change 

• once all sessions of «SiU, . . . , U<Sj get their optimal rates set, sessions in 
<Sj + i will eventually get their optimal rates. 

It follows from the proof of Lemmae 4.4 and 4.5 that the time re- 
quired for all sessions of <Sj + i to obtain their optimal rates after all sessions 
in <SiU, . . . , U<Sj obtain their optimal rates is at most the time needed by any 
session to complete 4 round-trips (See the comment at the end of the proof 
of Lemma 4.4 in Appendix 3). Suppose now that the round-trip delay for is 
bounded by some D. Then the following upper bound holds: 

Proposition 5.1 Given an upper bound D on round-trip delay and the num- 
ber N of iterations of the global procedure, the upper bound on the algorithm 
convergence time from any initial conditions is given by AND. 

Note that the number of iterations of the global procedure is exactly 
equal to the number of different rates in the optimal rate vector. 

It can also be easily seen that if the network operates at the optimum 
and then a new session comes in or exits, it takes at most AD(N <^>M) for 
the algorithm to converge to a new set of optimal rates. Here N is again 
the number of iterations of the global procedure for the new set, and M is the 
"seniority" of the session, i.e. the index of the iteration of the global procedure 
at which the optimal rate of this session is assigned. This is simply because by 
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operation of the algorithm sessions of lower optimal rates will not be affected 
by the newly arrived or departed session. 

A few words are due about this bound. 

First this bound gives the theoretical worst-case guarantee. In practice, 
convergence time should be expected to be significantly better. In fact, in our 
simulations we were not able to produce convergence worse than 2N round- 
trips. However, this result is tight in the sense that it is possible to create a 
very unfortunate sequence of packet transmissions to attain this bound. 

Second, the convergence time measured in round-trips does not give a 
good insight into the actual convergence time measured in real time units if 
the time of round-trip delay D is not satisfactory bounded. Given a feasible 
set of transmission rates, a network configuration, a particular underlying 
service discipline, and the source's traffic shaping mechanism, we could hope 
to be able to obtain such bound either from experiment or from theoretical 
analysis. References [25], [11] provide such upper bounds for particular service 
disciplines and source traffic shapes. Note also that for feasible constant-rate 
smooth flows of infinitely small packets D is simply the network propagation 
delay. 

However, unless special measures are taken, the transient infeasibility 
of transmission rates can cause significant queue growth, and, as a result, 
can significantly increase the upper bound on D and slow convergence of the 
algorithm (as measured in real time rather than the number of round-trips). 
Thus it would be very important to ensure that the algorithm produces a 
feasible transmission vector as early as possible. 

To get some intuition on this, consider a synchronized implementation 
of the algorithm in which all sessions start simultaneously and each link up- 
dates its state when it has received information from all of its users. Suppose 
that all sessions are known at all links. Then it can be easily seen that after a 
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link updates all of its sessions (on their very first pass), and possibly resets the 
stamped rates of the packets, the sum of all stamped rates on output does not 
exceed the capacity of that particular link. Thus, in such synchronized version 
when all packets return to their sources, the resulting vector of stamped rates 
will be feasible. This feasibility will continue to hold if we look at the stamped 
rate as it is received in the incoming feedback packet rather than as it is set in 
the outgoing packets. ( Note that the vector of stamped rates of the outgoing 
packets may not and will not be feasible if a source detects that it needs to 
increase its rate upon receiving a packet with u-bit set to zero. In this case 
the stamped rate of the outgoing packet will be set to potentially very large 
value of the session's demand). 

Hence, in the synchronized implementation we could preserve the fea- 
sibility of the actual rate vector by simply setting the actual rate to the newly 
received stamped rate if the u-bit is set to 1 and preserving the old actual 
rate otherwise. If in addition we choose initial transmission rates conserva- 
tively to ensure initial feasibility, we would ensure feasibility of the algorithm 
in transience. Clearly, as soon as the stamped rates converge to their optimal 
values, this policy will reset the actual rates to the optimal values at most one 
round-trip later. 

However, in an asynchronous implementation this policy will not nec- 
essarily ensure feasibility. This can be seen by observing that a faster session 
can increase its actual transmission rate before slower sessions realize that they 
need to decrease theirs. We have not rigorously proved, but we believe that 
the heuristic described below will resolve this problem. The key point is that 
the actual transmission rate does not have to be adjusted at the same time 
as the stamped rate, since the stamped rate calculation is completely decou- 
pled from the underlying traffic. Suppose the value of D for the uncongested 
network is available. Then, 
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• if the actual transmission rate needs to be decreased according to the 
"synchronized" policy described above, decrease it immediately 

• if the actual transmission rate needs to be increased, according to the 
"syncronized" policy wait for 2D before increasing it. 

The rationale for this policy is that decreasing the rate cannot possibly 
violate feasibility, while increasing it can, if the other sessions are not yet 
aware of the increase. Since the stamped as opposed to the actual rate is 
in fact increased according to the original algorithm, on the next round-trip 
the new rate increase will be known at all links, so the other sessions will be 
notified about this change no later than on their next round-trip after that. 

The results of our simulation in fact suggest that such policy may be too 
conservative. In the experiments described in the next section we were unable 
to produce infeasibility for more than one round-trip after a new session entered 
the network even without implementing the above policy. More investigation 
on this issue is called for. 

It should be also noted that one round-trip after all sessions become 
known at all links, any session's stamped rate is at least as large as the min- 
imum of its demand and the equal share of the capacity of the bottleneck 
session for this link. Note that while this may be less than the optimal rate 
of this session, this bound ensures that all sessions are guaranteed reasonable 
throughput even before the optimal rates are obtained. 

The above considerations in combination with the simulation results 
presented in the next section lead us to believe that the algorithm is "well- 
behaved" in transience. 

Finally it is instructive to compare the behavior of this algorithm to 
other algorithms for finding maxmin optimal rates. In particular we will look 
at Selective Binary Feedback Scheme (SBF) presented in [26] and the scheme 
presented in [23]. While extensive comparison is beyond the scope of this 
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work, we believe that the following simple example gives a good insight on the 
comparative behavior of these algorithms. 

Consider a network consisting of one bottleneck link of capacity C 
shared by two sessions with the desired demand D. We will now investigate 
the behavior of the three algorithms in such system. Since SBF scheme is 
designed to operate in the window environment, we "translate" the window 
size to the effective transmission rate by dividing the window size by the 
round-trip delay. 

For simplicity, we assume synchronized operation. We look at the case 
when initial demands D > C. Suppose C = 20 and initial rates are R\ = 100, 
R 2 = 50. The optimal rate for both session is 10 

Selective Binary Feedback Scheme This scheme essentially computes the 
same value for link controls (called "fair allocation" in [26]) as the "ad- 
vertized rate" in this thesis. However, instead of delivering the minimal 
value of all link controls to the source, SFB sets a bit in the packet 
header if the current demand of the session on the link is above the link 
control value. If this bit is set in the feedback signal, the source adjusts 
its effective rate to R new = cR° ld , where < c < 1. If the bit is 0, the 
source sets the effective rate to R new = R old + b, where b > 0. Note that 
the optimal rate in our example is C/2 for both sessions. 

Clearly, if D <^>C/2 is large, it may take quite a few iterations to just 
reduce the rate below C/2. Then the algorithm will additively increase 
the effective rate until it gets above the current link control, etc. Note 
that it is impossible to ensure feasibility, since the algorithm will invari- 
ably have to eventually raise the rate above its feasible value to "feel" 
its way to the optimum. The following sequence gives the effective rates 
for the two sessions of our example for the case c = b = 0.5: 

i?i : 100, 50, 25, 12.5, 13.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 10.0 10.5, 11.0 ... 
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R 2 : 50, 25, 12.5, 6.25 6.75, 7.25, 7.75, 8.25, 8.75, 9.25, 9.75, 10.25, 5.125, 

C-V 1 r 

Mosley [23] calculates link controls as max, r t -\ ^ LL - L , where r, is the "ob- 
served rate" of session i across the link and n is the number of sessions 
crossing the link. As in our scheme, the minimal value of the link con- 
trols is conveyed to the source, which then sets its transmission rate to 
this value. 

The following sequence gives the rates obtained at several iterations for 
the sessions in our example: 

R-y : 100, 25, 10, 10, 10 ... 

R 2 : 50, 25, 10, 10, 10 ... 

Our Scheme As we have seen earlier, our scheme gets the optimal values on 
the very first round-trip. 

Rx : 10, 10, 10, 10, 10 ... 

R 2 : 10, 10, 10, 10, 10 ... 

As indicated by this trivial example, it should be expected that the 
algorithm described in this thesis converges and reaches feasibility faster. This 
expectation should be intuitively clear for the following reasons. SBF, while 
taking information about individual sessions into account, does not have a way 
to efficiently use this information, since the source is only notified whether its 
rate must be increased or decreased, but is not told "how far to go". Mosley's 
scheme does inform the source about the rate it should transmit at, but the 
information needed to compute this rate is based on the aggregate rate of 
all sessions and the maximum rate of all sessions. In contrast, our algorithm 
takes full information about the individual session rates into account when 
calculating the rate estimate, and informs the source about this rate. 
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6 Simulation Results 

Experiments 1-5 show that the algorithm described in this thesis allows the 
sessions to adapt quickly to the changes in network load due to sessions enter- 
ing or exiting the network or variation of active session demands. 

The MIT Advanced Network Architecture group's network simulator 
was used in in this work. 

A number of simulation experiments have been performed on different 
configurations. The results of these experiments are provided in the following 
subsections. We demonstrate that the algorithm allows the sessions to adapt 
quickly to any changes in the network load and converge to the current optimal 
rate. 

All experiments were performed with deterministic packet transmission 
rates on an underutilized network. The main purpose of these experiments was 
to investigate the number of round-trips required for convergence in the pre- 
sense of dynamic network changes. Even in an underutilized network the actual 
length of round-trips of different sessions and even different packets of any one 
session can be different due to different path lengths, event scheduling, etc. 
Thus, the number of round-trips seems to give more insight into the algorithm 
behavior than the actual time. We observed that in all our experiments the 
time of convergence was well below the worst-case theoretical bound obtained 
in section 5. 

In addition we investigated the feasibility of the transient solutions. 
We have not used the heuristic described in section 5. Even without it, the 
rate vector of stamped rates became feasible after one round-trip following a 
change in the network load. 

Finally, we investigated the behavior of the algorithm with link control 
calculation used by the Selective Binary Feedback Scheme. Since no substan- 
tial difference was observed, the results of these simulations are not shown. 
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Note that the stamped rate plotted in all figures of this thesis reflect 
the stamped rate as seen at the input to the sources. That is, if we were to 
follow the "synchronized" policy described in 5 this stamped rate would in 
fact reflect the actual transmission rate. 

6.1 Experiment 1 

Network configuration of this experiments is shown in Figure 1. The purpose of 
this experiment is to observe how the sessions adjust to the changing network 
load when sessions enter or exit the network. 

In this configuration 5 sessions share one bottleneck link. At time 
all sessions begin transmitting their data. Their initial demands exceed the 
fair share of the bottleneck link. Eventually session 1 exits, while sessions 2-4 
continue to transmit. Then, some time later sessions 2 and 3 exit. Still some 
time later session 3 reenters, but this time its demand is below its fair share 
of the bottleneck link. Finally, session 2 reenters with its initial demand. 

Tables 1 and 2 contain the session demands and session optimal rates 
at different times of this experiment. 

Figures 2-6 show the behavior of the stamped rates of Sessions 1-5 as a 
function of time. Boxes on the plots correspond to the round-trip epochs. Note 
that the sessions quickly adjust to the changing network load when sessions 
enter or exit. Table 3 gives the number of round-trips each session took to 
converge to the new optimal rates after each change in the network load. Note 
that all these numbers are within the theoretical guarantee of 4. After the 
very first round-trip feasibility is preserved throughout the experiment. 

6.2 Experiment 2 

The setting of this experiment is similar to Experiment 1 (see Figure 7). 

There are 15 sessions sharing one bottleneck link. All sessions start 
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transmission at time 0. Given infinite demand, the fair share of the bottle- 
neck link is 10000. We consider the case when the demand of sessions 1-7 is 
strictly below this value, while demand of sessions 9-15 is strictly above 10000. 
Demand of session 8 is exactly 10000. 

The demands are chosen so that the total demand is exactly equal to 
the bottleneck link capacity. As a result, the demand vector is also the optimal 
rates vector (see Table 4). 

In this scenario, it should be expected that sessions 1-8 should be able 
to transmit at their demand immediately, while sessions 9-17 should regain 
the capacity not used by the less demanding sessions and should stabilize to 
their demanded rate. Moreover, any session of 9-17 should wait for all sessions 
with lower optimal rates to stabilize before it can reach its own optimal rate. 

Figures 8-10 show that the sessions behave exactly as expected. Note 
that this time the x-axis is the number of round-trips rather than the actual 
time. Since different round-trips may take different time, the plots of stamped 
rate versus time will not necessarily be as well alligned. 

We see that the worst observed convergence time is 10 round-trips 
(session 15) is drastically smaller than the theoretical guarantee of 60. 

6.3 Experiment 3 

In this experiment we investigate the behavior of the network with 2 levels of 
bottleneck links. 

Configuration used in this experiment is shown in Figure 11. Sessions 
1-4 with large initial demands start transmission at time 0. Sessions 3 and 4 
share the first-level bottleneck link 2. Their optimal rates are 5000. Sessions 
1 and 2 share the second-level bottleneck links 1 and 2. Their optimal rates 
are 1000. 

Figures 12 - 15 show the behavior of sessions 1-4. As expected, the 
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sessions sharing the second-level bottleneck links take longer to converge to 
their optimal rates as the sessions sharing the worst bottleneck. 

Note the typical "climbing upstairs" behavior of the sessions sharing 
the second-level bottleneck (sessions 1 and 2). They operate at a lower rate 
until they realize that the first-level sessions operate at low rates and regain 
the free capacity. 

We still observe that the worst obsered convergence time of 6 (session 
2) is much smaller than the theoretical guarantee of 12. 

6.4 Experiment 4 

This experiment is intended to investigate the behavior of the network with 
different number of hops in the routes of different sessions and different optimal 
rates. Configuration of this experiment is given in Figure 16. It is clear that 
the actual number of round-trips required for all sessions to converge to their 
optimal rates depends on the specific timing of the events in the system. In 
particular, clearly session 4 should receive its optimal rate after the very first 
round-trip. In contrast, session 1 may have to wait till all other sessions 
stabilize to their optimal rates before getting its optimal rate assigned. Since 
session 1 also has the shortest route, it should be expected that it will take 
session 1 significantly more round-trips to converge than session 1. Similarly, 
session 2 may have to wait for session 1, and session 3 may have to wait for 
sessions 1 and 2. 

Optimal rates of the sessions are given in table 5. 

Figure 17 shows that the simulated behavior of session 4 is exactly as 
expected, that is, it converges to its optimal rate immediately after the first 
round-trip. Session 3 also converges after the first round-trip, which is faster 
than the worst-case behavior described above. The behavior of sessions 3 and 
4 corresponds to our expectations. Session 2 converges on the third round- 
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trip, and it takes session 3 6 round-trips to converge. This is much better than 
the theoretical guarantee of 16. Again, we see that feasibility is preserved 
throughout the experiment. 

6.5 Experiment 5 

Configuration of this experiment is shown in Figure 18. This experiment is 
intended to examine a more sophisticated case in which different sessions have 
different number of links in their routes. In addition, there are 3 levels of 
bottleneck links here. We also investigate the response of the algorithm to 
dynamic changes. 

All 5 sessions enter at time 0. Then session 3 exits at time 15. At time 
48 sessions 1 and 2 exit, and, finally, at time 67 session 3 reenters. Demands 
of all sessions are large. 

Optimal rates of sessions at different times are given in Table 6. Figures 
19-23 show that all sessions quickly determine their optimal rates after load 
changes. Table 7 gives the number of round-trips required by each session to 
stabilize to the new optimal rate. 

We observe again that the maximum observed number 7 of round-trips 
required for the algorithm to stabilize to its optimal rate after a change in the 
network load is still significantly below the theoretical upper bound of 12. 

Note also that the feasibility is restored after the very first round-trip 
following a change in the set of network users. 

7 Discussion 

7.1 Remarks on M-consistency 

As mentioned earlier in this work, M-consistent rate calculation was essential 
in ensuring the algorithm convergence. We proved earlier that our advertized 
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rate calculation policy was M-consistent. We emphasize that checking for 
violation of M-consistency and enforcing it at all times ensures full recovery 
from any previous data corruption. 

It is possible to construct other M-consistent policies. In this section we 
will show that the scheme for link control calculation in the Selective Binary 
Feedback Scheme presented in [26] is also M-consistent. The SBF performs 
calculation on each update of its link control Af a i r , called "fair allocation" as 
follows. Let J denote the set of a "allocated" sessions, and A^ denote the rate 
of session i as observed by the link. 

• Initially it sets Af a i r = — , J = 

• Otherwise, if Ai < Af a i r , allocate i, i.e. set J = J + {i} and calculate 

, C^EjAi 



n ^number of unallocated sessions 
Repeat until there are no sessions to allocate. 

Note that the "allocated" sessions in SBF are analogous to our "marked" 
sessions. 

It can be easily seen that SBF is also M-consistent. Thus, we can 
use SBF control calculation in our algorithm and get the same convergence 
results. In fact, we have done that in simulation and no significant changes in 
the algorithm behavior were observed. 

Note significant similarity in the actions taken between the two schemes. 
The main difference, however, is that SBF does not keep track of previous 
allocations (or markings in our terminology) and recalculates them anew on 
each calculation. In contrast, our scheme remembers the past allocations and 
uses the past information for the new calculation. 

Note also that the two schemes may produce different values of link 
controls and the sets of marked sessions. To see this, consider 3 sessions with 
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"observed", or "recorded", rates equal to 1, 1 and 5 on a link of capacity 6. It 
is entirely possible that in our algorithm one of the sessions with ( "recorded" ) 
rate 1 is marked, while the other 2 are not. Our algorithm will then calculate 
the control value ("advertised rate") as ^^ = 2.5. The calculation of SBF for 
theses rates will calculate its control value ("fair allocation") as 4 and would 
mark ("allocate") both sessions with observed rates 1. 

Thus, a family of link control calculation policies has been identified. 
If plugged into the algorithm suggested in this thesis, any policy of this family 
will ensure convergence and the transient properties described in the previous 
sections. 

7.2 Comments on the usage of the "Stamped Rate" 
Field in the Packet Header 

The algorithm makes substantial use of the field in the packet header, which 
we call the "stamped rate" of the packet. Note that if the actual transmission 
rates could be accurately measured on input to a switch and if newly calculated 
rates could be accurately enforced on output, the packets would no longer need 
to carry this field, thus reducing the overhead. However, despite the obvious 
overhead, the use of this field may be well justified for the following reasons. 

• It is very difficult to measure and enforce rates with sufficient accuracy. 
Such measurements and enforcement strongly depend on the underly- 
ing service discipline and the shape of underlying traffic. The effects of 
measurement and enforcement errors are unclear and need more investi- 
gation. 

• Rate measurements constitute significant operational overhead in every 
network node. It can be argued that measurments might be necessary 
anyway for rate enforcement for some service disciplines. While this is 
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certainly true for some service disciplines, simple FIFO does not need to 
perform any measurements. Rate measurments may also be needed for 
policing misbehaved users. However, such policing can be done at the 
entry to the network only, thus relieving the rest of the network from 
the necessity of measuring the rates. 

• Having the stamped rate in the packets completely decouples the pro- 
cesses of rate calculation and rate enforcement. In particular, 

— the algorithm can be used with any underlying service discipline (as 
long as the packets of each sessions are served FIFO), e.g. FIFO, 
FIFO+, Round-Robin, Stop-and-Go, etc, with any flow control 
mechanism and any traffic shaper; 

— the algorithm is robust in the presence of heavy packet loss in the 
sense that it converges even if only some packets continue to get 
through; 

— the above quality allows to the algorithm on top of any other algo- 
rithm in some specialized control packets if needed 

These considerations in combination with the convergence and transient 
properties described earlier in this thesis suggest that the overhead of having 
an extra field in the packet header may be well justified by the benefits it 
produces. 

7.3 Policing Misbehaved Sessions 

The algorithm described so far suffers from the same problem as most of the 
other schemes using FIFO queues. That is, if a session violates the rules 
imposed by the algorithm due to malfunction, maliciousness or a different 
protocol used, it can flood the network with its data at the expense of well- 
behaved users. 
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Note that when a feedback packet returns to the source, its stamped 
rate is set to the current estimate of the session's allowed rate. If the packet's 
'u-bit' is set, then the source is supposed to adjust its transmission rate to 
this stamped rate. If the 'u-bit' is not set, then the source is expected to send 
unmarked packets with some a-priori known initial rate. 

Thus, the first switch on the session's way, which is also the last switch 
on the feedback's way can determine session's allowed rate by the information 
it the last feedback packet of that session seen. It can then measure the actual 
transmission rate of the session and drop packets of this session if this rate is 
exceeded. 

Note that this is only needed at the entry to the network, so the switches 
in the middle of the network connected to other switches only do not need to 
worry about misbehaved users at all. 

7.4 Network with Full-Duplex Links 

The assumption of this model was that each physical connection is modeled 
as two half-duplex links of identical capacity. 

It should be noted that the algorithm can be used for the case of the 
full-duplex links with the following modification. In essence, the link should 
no longer distinguish between the forward and the feedback sessions crossing 
it. Instead, it should keep track of the total number of sessions crossing it and 
should calculate its "advertized rate" as if all sessions were one-way only, but 
the total capacity of the link were t^tu rather than C. In addition, the control 
fields of the feedback packets should not be changed by the switches. It can 
be shown that with these modifications the algorithm will produce maxmin 
optimal rate allocation in a network with full-duplex links as well. 
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8 Summary and Areas for Future Research 

We have described an algorithm for distributed asynchronous computation of 
maxmin optimal session transmission rates. We have shown that the algorithm 
converges to the optimal rates faster than other algorithms achieving a similar 
objective and that it is well-behaved in transience. The algorithm is self- 
stabilizing in the presence of dynamic network changes and heavy packet loss. 
Unlike previous work, this algorithm takes bandwidth consumed by feedback 
traffic into consideration. 

We have argued that though a field in the packet header utilized by 
the algorithm constitutes undesirable overhead, it also provides significant 
flexibility of potential application. 

Note that the algorithm can be easily extended to the case when in- 
stead of end-to-end feedback the switch sends a feedback packet to the previous 
switch on its way to inform it about its control value if it is smaller. Such hop- 
by-hop feedback will propagate the minimum link control value (advertized 
rate) to the source faster than the end-to-end scheme. Investigating the be- 
havior of this algorithm with hop-by-hop feedback might be a topic for future 
research. 

Another possibility to improve convergence time of the algorithm might 
be to restrict allowed transmission rates to some discrete values. While this 
would certainly decrease the number of potential different values of the optimal 
vector and thus improve convergence time, the effects of such discretization 
are not very clear and need further research. 

We also believe that the transient behavior of the algorithm needs to be 
further examined in more extensive simulation and perhaps in some real-life 
environment. The results of such investigation may be crucial to determine 
practical applicability of the algorithm. 
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9 Appendix 1 

Summary of Notation 

1Z Set of switches in the network. 

C Set of (half-duplex) links in the network (assumed even). 

S(t) Set of sessions in the network at time t 

R Number of Switches in the network. 

L Number of (half-duplex) links in the network. 

S(t) Number of sessions in the network at time t. 

< k < 1 Ratio of session reception rate to acknowledgement transmission 
rate (assumed constant throughout the network). 

V(l) Mapping of set {1, . . . , Li) into itself such that 

• V{i) = j & V(j) = i 

• V(i) ^ i 

Mapping V breaks the set of all indices of the half-duplex links in the 
network into j pairs, such that each pair contains the indices of two 
half-duplex links corresponding to one full-duplex link 

Uij Set to 1 if session i crosses link j on its forward path, set to otherwise. 

Wij Set to 1 if session i crosses link j on its feedback path, set to otherwise. 

5f = Uij + kwij 

J-j{t) Set of sessions crossing link j on the forward path at time t. 

Bj(t) Set of sessions crossing link j on the feedback path at time t. 
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Qj(t) = J~j(t) U Bj(t) Set of all sessions crossing link j at time t 

fj(t) Number of forward sessions crossing link j at time t. 

bj(t) Number of forward sessions crossing link j at time t. 

Ci Capacity of link i. 

hi Number of links (hops) in the route of session i. 

n % - Index of the j-th link in the route of session i. 

A{ Set of half-duplex links incoming to switch i. 

r)i{i) Actual transmission rate of session i at time t. 

Ajj(£) Actual throughput of session i across link j at time t. 

ipi,j(t) Actual throughput of acknowledgement of session i across link j at 
time t. 

Vj{t) Actual service rate of link j at time t 

4>l {t) The queue in the output buffer of link i due to session j only at time t. 

Li Set of bottleneck links of iteration i of Procedure Global Optimum. 

Si Set of sessions crossing all links in Ci whose optimal rate is assigned iter- 
ation i of Procedure Global Optimum. 

Ti Optimal rate of sessions in Si 

fi Number of sessions of Si crossing link / on the forward path 

b\ Number of sessions of Si crossing link / on the feedback path 
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10 Appendix 2 



Proof of Theorem 2.1. 

At every iteration of the procedure at least one new link is chosen as 
the bottleneck link of the reduced network. All sessions crossing this link are 
added to Si. Note that none of these sessions have been added to Si at any 
iteration j < i. Thus, at the end of each iteration the number of sessions in Si 
increases at least by one. The algorithm terminates when all sessions in S are 
in Si, so at most S iterations are needed to terminate. This proves Statement 
1 of the theorem. 

We now prove statement 2. First note that by the way T,- t is denned 
Ti < r 2 < . . . < T m . 

Let Hi be the number of sessions in Si. Renumber the sessions in S 
so that the first rii indices correspond to sessions in <Si, the next n 2 indices 
correspond to sessions in S 2 , etc. Then the rate allocation obtained by Proce- 
dure Global Optimum can be written as (3 = (fa, . . . , (3s) = (tl, ■ ■ ■ , Ti, . . . r m , . . . , r m ) 

v v ' v v ' 

Let a = (cti, . . . ,as) be any other feasible rate allocation. We show that (3 
is lexicografically greater than than a. Let a be the a permutation of a such 
that ai < a 2 < . . . < as- Suppose a x > r x . Then consider any link / G C\. 
Note that by operation of Procedure Global Optimum T\ = . ' . 
The total throughput across link / is 

E a A > a i(fi + kb i) > vi(fi + kbi) = a 

where Sj = Uij + kwij. 

Thus, feasibility condition is violated, so it must be that a± < r t . If 
ai < Ti, then (3 is lexicografically greater than than a. Suppose a\ =T\. 

We now show that if aj = (3j V 1 < j < i then a* < fa. Let 1 < k < m 
be the index s.t. fa = r k . Consider any link / e Cu- Note that by operation of 
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the Procedure only links from C k cross I. Also, for this I 

Tk fi + kh&TJzKfi + kVl) u 

Suppose cii > Tk. Denote by Q the set of sessions j crossing link I s.t. (3j = 
Tk. Note that Q is just the set of sessions of Ck crossing I. Then the total 
throughput across link I is 

E *A = X>,(// + kb>) + e «,(// + kb>) > 

j&Oi 3=1 j&Q 

ET J (f!+kb>)+ ai j:(fi+w l )> 

3=1 3&Q 

= X>;(// + kb i) + n(fi + kb i ^E(// + *#)) = c, 

The last equality follows from (5). Thus, feasibility condition is violated 
again and it has been proved that it must be that a* < $. If the inequality is 
strict, then (3 is lexicografically greater that a. Thus it follows by induction 
that (3 is lexicografically greater than any other feasible rate allocation vector, 
so r are indeed optimal rates. 

This concludes the proof of Statement 2 of the theorem. The fact 
of Statement 3 of the theorem that T\ <, . . . < r m simply follows from the 
definition of the bottleneck link of iteration i. 

The claim of Statements 4, 5 and 6 of the theorem immediately follow 
from the operation of Procedure Global Optimum. 

The proof of Theorem 2.1 is now complete. 

11 Appendix 3 

Proof of Lemma 4.4 By Lemma 4.3 3 t\ > 0, s.t. V t > t\ 
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w >ti V/eA (6) 

m>n Vlec\d (7) 

It will now be shown that eventually all packets of any session in <Si 
will have stamped rate at least as great as r l5 any packet of any session not 
in Si will have stamped rate strictly above Ti, all sessions in Si have recorded 
rates at least as great as T\ at all links in their route, and all sessions not in 
Si have recorded rates strictly greater than T\ at all links in their route. That 
is, 3 t 2 such that the following inequalities hold V t > t 2 : 

£j > T i V I in route of j £ Si 

Pp > r i for an D packet p of j € 5i 

£j > ri V I in route of j £ Si (8) 

Ppj > n / or ari 2/ pacfcei p of j <£ Si 

Note that the last two inequalities of (8) are exactly the statements 1 
and 2 of this lemma. 

To see that (8) holds, consider some j & Si. Consider the first packet 
of j to leave the source after time ti. When feedback to that packet returns 
to the source at some time th > ti, two cases are possible: 

Case 1. 

The 'u-bit' of the packet is set. Then its stamped rate p p (tj) must be 
set to the minimum of the advertised rates of the links seen on the packet's 
route. By (6) and (7) it then must be that p p (ij) > T\. Hence at the source 

Ps{t}) > Ti 

Case 2. 

The u-bit is not set. Then by operation of the algorithm p s (t^) = oo > 
Ti. 
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Note that since individual sessions are served in FIFO order, any feed- 
back packet arriving at the source after time £* corresponds to a forward packet 
which left the source after time t\. Applying the above argument to any such 
packet we can see that 

Vt>t) p S3 (t)>n (9) 

Note that condition (9) holds for any session in Si at all times after it 
it has completed its first round-trip. 

Consider any packet of j which left the source after t*- Let t 2 - be the 
time when feedback to this packet returns to the source. Any packet of j 
which is in the network at any time after t 2 left the source after time ij. By 
operation of the algorithm, at any link I in its route its stamped rate is set to 
pnew _ min^^poM). By (9) its initial rate is set to p mit = p s > Ti, so by (6) 
and (7) it must be that for any packet p of session j the second inequality of 
8 holds for all* > t). 

By operation of the algorithm £j is set to the last seen p p of session 
j. Let i| denote the time when first feedback packet to any forward packet 
of j sent after t 2 returned to source. Then the recorded rate of session j at 
any link on its way was set after time t 2 -. It has already been shown that all 
packets of j have stamped rates above or equal T\ for any time t > t 2 . Hence, 
by operation of the algorithm and from 6 and 7 the first 8 holds for all t>t?. 

Thus, taking t 2 = maxj^s^ we have shown that the first two inequal- 
ities of (8) hold for all t > t 2 . 

By Theorem 2.1 any session j € S \ Si does not go through any link 
/ G C\. Repeating the exact same argument as above for j e S\Si and taking 
(7) into account, it can be shown that the third and fourth inequalities of (8) 
hold as well. 

Note that time t 2 is the time when all sessions in Si have completed 2 
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round-trips. 

To prove statement 3 of the lemma, consider any link I E C\. By 
Theorem 2.1 only sessions from S\ cross any link of C\. Then, if not all 
sessions are marked, 



fi + kbi <^Ej e5l a)8j 



By Theorem 2.1 T\ = , ' for any I E Ci. Hence for any I E £i 
Note that (10) and (8) imply that V t > h and V I E d 



n(fi + kbi) ^E ie5l CjajSj nifi + fc&j) ^n E ie5l <$. 



j 



By (6) /// > ri for all / € £ l5 so statement 3 of the lemma follows for 
any t > t 2 . 

The case when all sessions are marked is quite similar and is omitted 
here. 

To see that statement 4 holds, note that by Theorem (2.1) j crosses 
at least one link of £i, so by (8) and statement 3 of this lemma feedback to 
any packet which left the source after time t 2 returns to the source with the 
stamped rate set to r t . Denote the time of the return of this feedback to the 



source by t 4 



i 

3 



It then follows that for any t > t\ p s is set to Ti, so statement 4 of the 
lemma holds for all t > t 3 = maxjgSj t\. 

The proof of statement 5 is almost identical to the proof of statement 
4. Statement 6 follows from statements 2 and 3, which are aleady proved. 

Note that t 3 is the time when all sessions in <Si have completed their 
third round-trip. 

Finally, it follows from conditions 1-6 and the operation of the algorithm 
that any session j E Si will be marked at every link on its way as soon as the 
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first packet of j is sent after time t A - passes that link and will remain marked 
ever after as long as the set of sessions remains unchanged. Thus, there exists 
time £4 such that statement 7 of this lemma holds for all j. 

This completes the proof of Lemma 4.4. 

Note that £ 4 is the time when all sessions in Si have completed their 
fourth round-trip. We have just shown that it takes at most four round-trips 
of the "slowest" session in S\ to ensure that all sessions in S\ have reached 
their optimal rates and are marked with their optimal rates at all links in 
their routes. This result will be used in section 5 to obtain an upper bound 
on convergence time. 
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Session/time 


0-27 


27-42 


42-63 


63-86 


86-100 


1 


11000 


- 


- 


- 


- 


2 


11000 


11000 


- 


- 


11000 


3 


11000 


11000 


- 


2000 


2000 


4 


11000 


11000 


11000 


11000 


11000 


5 


11000 


11000 


11000 


11000 
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Table 1: Experiment 1: Demand of sessions 1-5 at different times. 
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Table 2: Experiment 1: Optimal rates of sessions 1-5 at different times. 
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Table 3: Experiment 1: Number of round-trips required for sessions 1-5 to 
stabilize to new optimal rates after changes in network load 1-5. 
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Figure 2: Experiment 1: Session 1. Boxes correspond to round-trip epochs. 
This session reaches its optimal rate after the first round-trip; it exits at time 
27 and never returns. 
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Figure 3: Experiment 1: Session 2. Boxes correspond to round-trip epochs. 
This session takes 2 round-trips to stabilize ti its original optimal rate of 4000. 
At time 27 session 1 exits, and this session restabilizes to the new optimal rate 
of 5000. At time 42 the session exits and re-enters at time 86, stabilizing to 
its new optimal rate of 6000. 
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Figure 4: Experiment 1: Session 3. Boxes correspond to round-trip epochs. 
This session exits at time 42 and re-enters at time 63 with a low demand of 
2000. Its behavior up to time 42 is similar to Session 2. Since its demand 
after time 2000 is lower than the equal share of all sessions in the network, its 
optimal rate is exactly equal to its demand. 
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Figure 5: Experiment 1: Session 4. Boxes correspond to round-trip epochs. 
This session does not exit during the experiment. At times 27, 42, 63 and 86 
other sessions enter or exit. It can be seen that the session quickly stabilized 
to the new optimal rates. 
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Figure 6: Experiment 1: Session 5. Boxes correspond to round-trip epochs. 
The behavior of this session is similar to Session 4. 
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Table 4: Experiment 2: Session demands (optimal rates are exactly session 
demands in this experiment. 
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Figure 7: Experiment 2: Configuration. Numbers next to the links are ca- 
pacites (x 10~ 3 ) 
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Figure 8: Experiment 2: Sessions 1-8. 
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Figure 9: Experiment 2: Sessions 9-12. 
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Figure 10: Experiment 2: Sessions 12-15. 



74 




30 




Figure 11: Experiment 3: Configuration. Demands of all session are large 
(40000). Numbers in italics next to the links are capacities (x 10~ 3 .) The 
bottleneck links are numbered by boxed bold figures in a box . Link 1 is the 
bottleneck for sessions 3 and 4, while links 2 and 3 are bottleneck for sessions 
1 and 2. Optimal rate of sessions 1 and 2 is 10000, optimal rate of sessions 3 
and 4 is 5000. 
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Table 5: Experiment 4: Optimal rates of sessions 1-4. 
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Figure 12: Experiment 3: Session 1 (optimal rate 10000). 
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Figure 13: Experiment 3: Session 2 (optimal rate 10000). 
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Figure 14: Experiment 3: Session 3 (optimal rate 5000). 
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Figure 15: Experiment 3: Session 4 (optimal rate 5000). 
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Figure 16: Experiment 4: Configuration. 
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Figure 17: Experiment 4: Sessions 1-4, Simultaneous start. 
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Figure 18: Experiment 5: Configuration. Sessions 1-5 enter at time 0. Then 
session 3 exits at time 15. Then sessions 1 and 2 exit at time 48. Finally session 
1 reenters at time 67. Sessions 2, 3 and 4 share the first-level bottleneck link 
5. Session 1 is bottlenecked by link 1. Finally, session 5 is bottlenecked by 
link 4. Capacities of the links are given in italics (x 1CT 3 ). Bottleneck links 
are numbered in boxed bold print. Links which are not numbered are not 
bottlenecks. 
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Table 6: Experiment 5: Optimal rates of sessions 1-5 at different times. 
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Table 7: Experiment 5: Number of round-trips required for sessions 1-5 to 
stabilize to new optimal rates after changes in network load 1-5. 
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Figure 19: Experiment 5: Session 1. Exits at time 48 and reenters at time 67. 
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Figure 20: Experiment 5: Session 2. Exits at time 48. 
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Figure 21: Experiment 5: Session 3. Exits at time 15. 
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Figure 22: Experiment 5: Session 4. 
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Figure 23: Experiment 5: Session 5. 
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